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Lossless Selection Views 
under Conditional Domain Constraints 

Ingo Feinerer, Enrico Franconi and Paolo Guagliardo 


Abstract —A set of views defined by selection queries splits a database relation into sub-relations, each containing a subset of the 
original rows. This decomposition into horizontal fragments is lossless when the initial relation can be reconstructed from the fragments 
by union. In this paper, we consider horizontal decomposition in a setting where some of the attributes in the database schema are 
interpreted over a specific domain, on which a set of special predicates and functions is defined. 

We study losslessness in the presence of integrity constraints on the database schema. We consider the class of conditional domain 
constraints (CDCs), which restrict the values that the interpreted attributes may take whenever a certain condition holds on the non- 
interpreted ones, and investigate lossless horizontal decomposition under CDCs in isolation, as well as in combination with functional 
and unary inclusion dependencies. 

Index Terms —selection, views, losslessness, constraints, CDC, consistency, separability 
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1 Introduction 

T HE problem of updating a database through a set 
of views consists in propagating updates issued on 
the views to the underlying base relations over which 
the view relations are defined, so that the changes to 
the database reflect exactly those to the views. This is 
a classical problem in database research, known as the 
view update problem ([1], [2], [3]), which in recent years 
has received renewed and increasing attention ([4], [5], 
[6], [7], [8], [9]). 

View updates can be consistently propagated in an 
unambiguous way under the condition that the mapping 
between database and view relations is lossless, which 
means that not only do the view relations depend on the 
database relations, but also the converse is true. How¬ 
ever, just knowing that such an "inverse" dependency 
exists is not yet sufficient to effectively propagate the 
changes from the views to the database. What is essential 
to know is how, in some constructive way, the database 
relations depend on the view relations. This amounts to 
being able to define each database relation in terms of 
the views by means of a query, in much the same way 
the latter are defined from the former [10]. In such a 
context, database decompositions [11] play an important 
role, because their losslessness is associated with the 
existence of an explicit reconstruction operator that, as the 
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create view VI as select * from R where DEPo"ICT" 

and POS="Manager" 

create view V2 as select * from R where BON<4000 
create view V3 as select * from R where POSo"Manager" 

Figure 1. Selection views over a company database. 


name suggests, prescribes how a database relation can 
be rebuilt from the pieces, called fragments, into which it 
has been decomposed. 

Lossless database decomposition is particularly relev¬ 
ant in distributed settings, where fragments are scattered 
over a number of sites (typically within a network), for 
the reason that it increases the throughput of the system 
by allowing the concurrent execution of transactions as 
well as the parallel execution of a single query as a set 
of subqueries operating on fragments [12]. 

Horizontal decomposition is the process of splitting a giv¬ 
en relation into sub-relations on the same attributes and 
of the same arity, each containing a subset of the rows of 
the original relation. For example, consider the relation R 
shown in Figure 1, recording data about the employees 
of a company: their name (EMP), the department (DEP) 
and the position (POS) in which they are employed, and 
their income (e.g., euros per month) consisting of a fixed 
salary (SAL) plus a variable bonus (BON). In Figure 1, 
the relation R is decomposed into three fragments: VI se- 
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lects the rows of R with employees working as managers 
in departments other than ICT, V2 selects the rows of R 
with employees who get strictly less than 4000 as bonus, 
and V3 selects the rows of R with employees who do not 
work as managers. This kind of decomposition is lossless 
when the original relation can be reconstructed from the 
fragments by union; in other words, the reconstruction 
operator for horizontal decomposition is the union. In 
the example of Figure 1, the set of views {V1,V2,V3} 
constitutes a lossless horizontal decomposition of R, as 
the union of VI, V2 and V3 contains all (and only) the 
rows of R. Each proper subset of{vl,V2,V3} constitutes 
a lossy decomposition of R, because each view selects at 
least one row that is not selected by any of the others; 
e.g., the union of VI and V2 does not contain the third 
row of R, (Linda, Finance, Consultant, 5000,4000), which 
is selected only by V3. 

Observe that the horizontal decomposition specified 
by the definitions of views VI, V2 and V3 in Figure 1 is 
lossless for the given relation R, but this is not the case 
for every relation (over the same attributes). For instance, 
the tuple (Sam, ICT, Manager, 6000,5000) is not selected 
by any of these views; indeed, every relation containing 
a row for an employee who works as a manager in the 
ICT department and receives a bonus greater than 4000 
would not be losslessly decomposed by VI, V2 and V3. 
In the presence of integrity constraints, however, things 
may be different, because some tuples, such as the one 
above, might not be allowed in the input relation. 

The study of horizontal decomposition ([13], [14], [15], 
[16]) has mostly focused on settings where data values 
can only be compared for equality. However, most real- 
world applications make use of data values coming from 
domains with a richer structure (e.g., ordering) on which 
a variety of other restrictions besides equality can be 
expressed (e.g., that of being within a range or above 
a threshold). Examples are the attributes SAL and BON 
of the relation in Figure 1, the dimensions, weights and 
prices in the database of a shipping company, or the vari¬ 
ous amounts (credits, debits, interest and exchange rates, 
etc.) recorded in a banking application. It is therefore of 
practical interest to consider a scenario where some of 
the attributes in the database schema are interpreted over 
a specific domain, such as the reals or the integers, on 
which a set of predicates (e.g., smaller/greater than) and 
functions (e.g., addition and subtraction) are defined, 
according to a first-order language £. 

In the present article, we consider horizontal decom¬ 
position in a setting with interpreted attributes, in which 
fragments are defined by selection queries consisting of a 
condition on the non-interpreted attributes, expressed by 
a Boolean combination of equalities, and a condition on 
the interpreted attributes, expressed by a formula in £. 
In particular, we study the losslessness (w.r.t. every input 
relation) of horizontal decompositions specified in this 
way, in the presence of integrity constraints on the data¬ 
base schema. We work under the pure universal relation 
assumption (URA) [11], that is, we restrict ourselves to a 


database schema consisting of only one relation symbol, 
as customary in the study of database decomposition. 

Contribution and Outline 

In Section 2, we introduce a class of integrity constraints 
called conditional domain constraints (CDCs). By means of 
a formula in £, a CDC restricts the values that the inter¬ 
preted attributes can take whenever a certain condition is 
satisfied by the non-interpreted ones. Depending on the 
expressive power of £, CDCs can capture constraints that 
naturally arise in practise; for example, in the scenario of 
Figure 1, it may be required that employees in the ICT 
department have a total income (i.e., salary plus bonus) 
of at most 5000, that employees working as managers get 
a bonus of at least 2000, and that employees never re¬ 
ceive a bonus greater than their salary. These constraints 
can be expressed as: 

DEP = "ICT" => SAL + BON < 5000 ; (la) 

POS = "Manager" => BON > 2000 ; (lb) 

SAL - BON > 0 . (lc) 

As we shall see, the views of Figure 1 losslessly decom¬ 

pose every relation satisfying the above CDCs. 

In our investigation, we do not commit to any specific 
language £ and we simply assume that £ is closed under 
negation. 

In Section 3, we characterise consistent sets of CDCs 
in terms of satisfiability in £. Whenever the satisfiability 
of sets of formulae in £ is decidable, our characterisation 
directly gives a decision procedure for checking whether 
a set of CDCs is consistent. This is the case, e.g., for 
the so-called Unit Two Variable Per Inequality fragment of 
linear arithmetic over the integers, whose formulae (re¬ 
ferred to as UTVPIs) consist of at most two variables and 
variables have unit coefficients, as well as for Boolean 
combinations of such formulae. We prove that deciding 
consistency is NP-complete for both of these languages. 

In Section 4, we characterise lossless horizontal de¬ 
composition under CDCs in terms of unsatisfiability in 
£. Whenever the satisfiability of sets of formulae in £ is 
decidable, this characterisation gives a decision proced¬ 
ure for checking whether a horizontal decomposition is 
lossless under CDCs. We show that this problem is co- 
NP-complete when £ is the language of either UTVPIs 
or Boolean combinations of UTVPIs. 

In Section 5, we study lossless horizontal decomposi¬ 
tion under CDCs in combination with traditional integ¬ 
rity constraints. We show that functional dependencies 
(FDs) do not interact with CDCs and can thus be allowed 
without any restriction, whereas this is not the case for 
unary inclusion dependencies (UINDs). We provide a 
domain propagation rule to derive a set of CDCs that fully 
captures the interaction between a given set of UINDs 
and opportunely restricted CDCs w.r.t. lossless horizon¬ 
tal decomposition, which makes possible to employ the 
general technique for deciding losslessness also in the 
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presence of UINDs. In addition, we consider restricted 
combinations of CDCs with both FDs and UINDs. 

We conclude in Section 6 with a discussion of the res¬ 
ults, relevant related work and future research directions. 

2 Preliminaries 

We start by introducing the necessary notation and no¬ 
tions that will be used throughout the article. We assume 
some familiarity with formal logic and its application to 
database theory. 

Basics. An n-tuple is an ordered list of n elements, where 
n is a positive integer. We denote tuples by overlined 
lowercase letters (e.g., t) and we write them as comma- 
separated sequences enclosed in parentheses; the fc-th 
element of a tuple t is denoted by t[k\. For example, if 
t is the 4-tuple (a, 6, c, a), then t[3] = c. An n-ary relation 
on a set A, where n is called the arity of the relation, is 
a set of //-tuples of elements of A. 

A schema is a finite set S of relation symbols, also 
called a relational signature. Each relation symbol S has a 
positive arity |Sj indicating the total number of positions 
in S, which are partitioned into interpreted and non- 
interpreted ones. Relation symbols of arity n are called 
n-ary, we indicate that |Sj = n by writing S/n. 

Let dom be a possibly infinite set of arbitrary values, 
and let idom be a set of values from a specific domain 
(e.g., the integers Z) on which a set of predicates (e.g., 
<) and functions (e.g., +) are defined, according to a 
first-order language £ closed under negation. An instance 
over a schema S associates each S £ S with a relation S 1 
of appropriate arity on dom U idom, called the extension 
of S under I, such that the values for the interpreted and 
non-interpreted positions of S are taken from idom and 
dom, respectively. The set of elements of dom U idom 
occurring in an instance I is called the active domain of 
I, denoted by adom(J). An instance is finite if its active 
domain is, and all instances in this article are assumed 
to be finite. A fact is given by the association, denoted by 
Rif), between a relation symbol R and a tuple t of values 
of appropriate arity; an instance can be represented as a 
set of facts. 

Constraints. A language over a relational signature S is a 
set of first-order logic (FOL) formulae over S with con¬ 
stants dom U idom under the standard name assumption 
(i.e., the interpretation of each constant is the constant's 
name itself). A formula in some language £ is called an 
£-formula. The sets of constants and relation symbols 
that occur in a formula <p are denoted by const(y) and 
sig(p), respectively; we extend const(-) and sig(-) to sets 
of formulae in the natural way. 

A constraint is a closed formula (that is, without free 
variables) in some language. For a set T of constraints, 
we say that an instance I over sig(L) is a model of (or 
satisfies) F, and write / |= F, to indicate that the relational 
structure (adom(J) Uconst(r), I) makes every formula 
in T true under the standard FOL semantics. We write 
I |= tp as short for I \= {y}, and say that / satisfies y. A 


set of constraints T entails (or logically implies) a constraint 
y, written I' |= y, if every finite model of F also satisfies 
y. All sets of constraints in this article are finite. 
Propositional Theories. A propositional variable is a vari¬ 
able whose value can be either T (true) or F (false). A 
propositional formula is a Boolean combination of propos¬ 
itional variables, including the two special propositional 
variables T and J_, whose values are always T and F, 
respectively. A propositional theory is a set of propositional 
formulae. We denote the set of propositional variables 
occurring in a propositional formula P by var(P) and 
we extend var(-) to propositional theories in the natural 
way. A valuation of a set of propositional variables (also 
called a truth-value assignment) assigns a truth-value (i.e., 
either T or F) to each propositional variable in the set. 
The truth-value a(P) of a propositional formula P under 
a valuation a of its propositional variables is determined 
by the standard semantics of the Boolean connectives. 
We say that a satisfies (or makes true) P, and write a \= P, 
if a(P) = T. Given a propositional theory II, a valuation 
of var(II) satisfies II, written a |= II, if a satisfies every 
propositional formula in II. 

Horizontal Decomposition 

We consider a source schema R, consisting of a single 
relation symbol R, and a decomposed schema V, disjoint 
with R, of view symbols with the same arity as R. We 
formally define horizontal decomposition as follows. 

Definition 1. Let R = {f?} and V = {Vf,..., V n }. Let A 
be a set of constraints over R and let £ be a set of exact view 
definitions, one for each Vi £ V, of the form \/x.Vi(x) y-t 
<p{x), where y is a safe-range 1 formula over R. Then, £ 
is a horizontal decomposition of R into V under A if 
A U £ |= Vz. Vifx) —> R(x) for every Vi £ V. We say that 
£ is lossless if A U £ |= V®. R(x) y-t Vi(x) V • • • V V n (x). 

For the sake of simplicity, w.l.o.g. we assume that the 
first ||f?|| positions of R and of every V £ V are non- 
interpreted, while the remaining \R\ — | /f | positions are 
interpreted. Under this assumption, instances over R U 
V associate each relation symbol with a subset of the 
Cartesian product dom 1 x idom'' A , where n = R\ and 
k = ||f?||. Unless otherwise specified, when we speak of 
a tuple t we implicitly assume that t is of arity n and 
that the first k values of t are from dom while the rest 
are from idom. W.l.o.g. we also assume that a variable 
associated with the i-th position of R is named x, if i < k, 
and i/i-k otherwise. By default, x and y denote the tuples 
(aii,..., Xk) and (y lt ..., y n - k ), respectively. 

Since every (^-formula is over variables associated with 
interpreted positions, we write f(y) to indicate that 0 is a 
£-formula whose free variables are among the variables 
in y. For a tuple t y of n—k values from idom, we denote 
by fifty) the result of replacing every occurrence in 0 of 
the free variable y, with the value t y [i\. We say that t y is 

1. For details on the syntactic notion of range restriction, correspond¬ 
ing to the semantic notion of domain independence, refer to [11]. 
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a solution to </> if o(t y ) is true under the semantics of if. In 
such a case, we also say that the assignment /? associating 
each yi with t y [i\ satisfies <f, and we write (3 \= f >. 2 

Source Constraints. The class of integrity constraints we 
consider on the source schema R is that of conditional 
domain constraints (CDCs), which restrict the admissible 
values at interpreted positions by means of formulae in 
(f, when a certain condition holds on the non-interpreted 
ones. Formally, a CDC is a formula of the form 

Vz, y . (R(x, y ) A A(i)) -t 6{y) , (2) 

where A (a;) is a Boolean combination of equalities x = a, 
with x from x and a from dom, and Sty) £ (1. We use x f 
a as short for -<(x = a) and, for ease of notation, we write 

(2) simply as X(x) —t S(y). Here, we make use of a more 
general variant of the CDCs introduced in [17], where the 
condition A (a;) was limited to a conjunction of possibly 
negated equalities. In general, (2) is more expressive than 
a CDC of the form used in [17], as in the latter negation 
is allowed only atomically, that is, in front of equalities, 
and so disjunction cannot be expressed. However, there 
is no difference in expressivity between the two variants 
when considering sets of CDCs, because the antecedent 
of ( 2 ) can always be rewritten in disjunctive normal form 
(DNF) and the CDC split into a set of CDCs having the 
same consequent and each disjunct as antecedent. E.g., 
the CDC X\ = a A ( X 2 yf b V x 3 = c) —> S(y) is equivalent 
to the following set of CDCs: 

{xi = a A X 2 yf b —t <5(17), ii=aAi 3 = c-> <5(77)} • 

Standard domain constraints on non-interpreted at¬ 
tributes, of the form R(x , y) —> Xi = a\ V- ■ -VXi = a n with 
a\ ..,a n £ dom, can be expressed by the two following 
CDCs , 3 for some 6 (y) £ £: 4 

x% yf A * * * A Xi y^ a n ^ <5(?/) , 

Xi y^ ai A • • • A Xi ± a n ~^S(y) . 

View Definitions. The view symbols in V are defined 
by selection queries with conditions on both interpreted 
and non-interpreted positions. Formally, each V £ V is 
defined by a formula of the form 

VT,77- V{x,y) A 4 (R(x,y) A A(x) A a(y)) , (3) 

where A(a:) is as in (2) and a(y) £ (f. In the following, 
we write (3) simply as V: X(x) A a(y). View definitions 
of this form clearly generalise those in [17], where X(x) 
is limited to a conjunction of possibly negated equalities 
and disjunction cannot be expressed. While this has an 
impact on end-users, who can define more expressive 
views, there is no difference between the two formalisms 
w.r.t. losslessness, in that any view symbol V defined by 

(3) can be split into a set of views, defined by formulae 

2. Sometimes, by abuse of terminology, we say that an assignment 
/3 is a solution to a £-formula <f>, with the obvious meaning. 

3. Repetition of variables in the antecedent of a CDC is allowed. 

4. Recall that £ is closed under negation, hence -> 6(y) £ £. 


of the form used in [17], that together select exactly the 
same tuples as V; given X(x) in DNF, each of these view 
definitions has the same selection condition as V on the 
interpreted attributes and a disjunct of A (a:) as selection 
condition on the non-interpreted ones. For example, the 
view V: X\ = a A (X 2 bV x 3 = c)/\a(y) selects the same 
tuples selected by the following set of views: 

{ Vf: x\ = a A X2 yf b A cr(y), V2 : X\ = a A X3 = c A cr(y)} . 

The technique we will present in Section 4 for checking 
whether a set of selections of the form (3) is lossless can 
also be applied when (some of) the selections have the 
form V: X(x) V <j(ij) by considering, in place of each such 
selection, the two selections V' : X(x) and V": cr(y). 
Running Example. To clarify the notation and illustrate 
the concepts introduced so far, we now give an example 
that will be used also in the rest of the article. It is based 
on the source schema of Figure 1, the CDCs (la)-(lc) in¬ 
formally described in Section 1 and the views previously 
specified in Figure 1 by means of SQL statements. 

Example 1. Let R be a relation symbol of arity 5, whose 
positions are associated with attributes EMP, DEP, POS, 
SAL, BON, in this order, with the last two interpreted 
over the integers. Differently from the example of Fig¬ 
ure 1 , for simplicity we assume that salaries and bonuses 
are given in thousands of euros/month. Let a = "ICT" 
and b = "Manager", and consider the following set A of 
CDCs: 


x 2 = a -t 2/1 + y 2 < 5 ; 

(4a) 

x 3 = b -t y 2 > 2 ; 

(4b) 

T —t j/i - y 2 > 0 . 

(4c) 


Let V = { Vi, V 2 , V 3 }, and consider the horizontal de¬ 
composition £ given by 

Vl : x 2 ± a A x 3 = b ; V 2 : y 2 < 4 ; V 3 : x 3 b . 

Specific Languages. The techniques we will present for 
deciding whether a set of CDCs is consistent (Section 3) 
and whether a horizontal decomposition is lossless un¬ 
der CDCs (Section 4) give actual algorithms when satis¬ 
fiability in if is decidable; in the case of losslessness, if is 
additionally required to be closed under negation. Thus, 
even though our investigation is in general independent 
of the choice of If, from a practical point of view it makes 
sense to consider concrete languages that enjoy both of 
the above properties. Two prominent such languages are 
Unit Two-Variable Per Inequality formulae (UTVPIs) and 
Boolean combinations thereof. UTVPIs, a.k.a. Generalised 
2 SAT (G2SAT) formulae [18], are a fragment of linear 
arithmetic over the integers. Formally, a UTVPI formula 
has the form ax + by < d, where x and y are integer 
variables, a,b £ {—1,0,1} and d £ Z. The following 


equivalences hold: 




ax + by > d ■<==; 

> (-ax) + (- 

-by) < (-d) ; 

(5a) 

ax + by < d ■<=; 

> ax + 

by <(d- 1 ) ; 

(5b) 

ax + by > d <==$ 

> ax + 

by > (d + 1 ) . 

(5c) 
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Thus, UTVPIs can express comparisons between two 
variables and between a variable and an integer, as well 
as compare the sum or difference of two variables with 
an integer. As integers allow to represent also real num¬ 
bers with fixed precision, UTVPIs may be sufficient for 
most applications. The CDCs and the view definitions of 
Example 1 can be expressed when £ is the language of 
UTVPIs. 

Observe that the equality x = y, where y is a variable 
or an integer, is not expressible within a single UTVPI; 
instead, a set consisting of two UTVPIs, namely x < y 
and y > x, is required. Therefore, in the consequent of a 
CDC, equality between variables or between a variable 
and an integer is expressed as follows: 

X(x) —»• y = z K=> A(5i) —> y < z A y > z 

K=> (A(x) -t y < z, X(x) -Ay>z) , 

with z either a variable or an integer. Equality between 
the sum or difference of two variables and an integer is 
expressed in a similar way. 

Whether a set of UTVPIs is satisfiable can be checked 
in polynomial time ([19], [20], [21]). We refer to a Boolean 
combination of UTVPIs as BUTVPI; deciding the satis¬ 
fiability of a set of BUTVPIs is NP-complete [22]. 

3 Consistent Sets of CDCs 

Before turning our attention to horizontal decomposi¬ 
tion, we first deal with the relevant problem of determ¬ 
ining whether a set of CDCs is consistent, that is, whether 
it has a non-empty model . 5 It is important to make sure 
that the integrity constraints over the source schema are 
consistent, as every horizontal decomposition is mean- 
inglessly lossless when there are in fact no legal relations 
to decompose. 

In this section, we will characterise the consistency of 
a set of CDCs in terms of satisfiability in £, where £ is 
not required to be closed under negation. The consistency 
problem for CDCs is the decision problem that takes as 
input a set A of CDCs and answers the question: "Is A 
consistent?" We will show that when £ is the language of 
either UTVPIs or BUTVPIs this problem is NP-complete. 
The technique employed here provides the basis for the 
approach we follow in Section 4 in the study of lossless 
horizontal decomposition. 

Observe that, given their form, CDCs affect only one 
tuple at a time, and so whether an instance satisfies a set 
of CDCs depends on each tuple of the instance in isola¬ 
tion from the others. Indeed, a set of CDCs is consistent 
precisely if it is satisfiable on an instance consisting of 
only one tuple, therefore we can restrict our attention to 
single tuples. Moreover, we are not really interested in 
the actual values of a tuple at non-interpreted positions; 
what we need to know is simply whether such values 
satisfy the conditions in the antecedent of each CDC or 

5. Since CDCs are universally-quantified closed implicational formu¬ 
lae, any set thereof is always trivially satisfied by the empty instance. 


not. To this end, with each equality between a variable 
x, and a constant a we associate a propositional variable 
pf, whose truth-value indicates whether the value in the 
i-th position is a. To each valuation of such propositional 
variables corresponds the (possibly infinite) set of tuples 
satisfying the equalities associated with the names of the 
propositional variables. For example, a valuation assign¬ 
ing true to Pi and false to p\ identifies all the tuples in 
which the value of the first element is a and the value of 
the second is different from b. A bit more care is needed 
with valuations of propositional variables that refer to 
the same position (i.e., have the same subscript) but to 
different constants (i.e., have different superscripts). For 
example, pf and p\ (with a / b) should never be both 
evaluated to true. 

As we shall see, checking whether a set A of CDCs is 
consistent amounts to first building a propositional the¬ 
ory by replacing the equalities with the corresponding 
propositional variables, and then looking for a valuation 
a such that: 

• any two propositional variables referring to the same 
position but to different constants are not evaluated 
both to true; and 

• the set of C-formulae that "apply" under a is satis¬ 
fiable. 

Definition 2. Let A = {fa,... ,<fi n } be a set of CDCs over 
R. For each fa c A, recalling it has the form (2), we construct 

prop (ff) = P ->Vi , (6) 

where P is a propositional formula (possibly T) obtained from 
the condition X(x) in the antecedent of f by replacing each 
equality Xi = a between a variable Xi and a € idom with the 
propositional variable pf, and v t is afresh propositional vari¬ 
able associated with the C-formula S(y), denoted by idf(tij), 6 
in the consequent of f. We denote {prop(^) | <j> e A} by IIa 
and we call it the propositional theory associated with A. 

We consider the set var(IlA) of propositional variables 
occurring in IIa partitioned into var p (IlA) = {var(P) | 

(P -t € n A } and var v (n A ) = var(n A ) \ var p (n A ). 

For a pair of distinct propositional variables pf and 
Pi associated with the same position i but distinct dom 
constants a and b, we consider the propositional formula 
Pi A p 1 ) -A JL, called the axiom of unique value for pf and 
Pi, intuitively stating that two distinct constants are not 
allowed in the same position. The axioms of unique value 
for a set of propositional variables consist of the axiom 
of unique value for each pair of distinct propositional 
variables pf and p\ in the set. A tuple t is consistent with 
a valuation a if, for every propositional variable pf, it 
holds that t[i] = a precisely if a(pf) = T. In general, 
given a valuation a of a set of propositional variables, by 
construction there exists a tuple consistent with a if and 
only if a satisfies the corresponding axioms of unique 
value for that set. 

6. idf stands for "interpreted domain formula". 
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Definition 3. Let A be a set of CDCs over R. The auxiliary 
theory n aux for IIa consists of the axioms of unique value 
for var p (n A ). 

Example 2 . The propositional theory associated with A 
of Example 1 is 

n A = {p 2 ->• Vl, P 3 -t V 2 , T ->■ V 3 } , 

where var p (II A ) = {p%, p\ } and var v (II A ) = {vi,v 2 ,v 3 }. 
The auxiliary theory for II A is II aux = 0 . The association 
between the propositional variables in var v (II A ) and the 
set of UTVPIs from the CDCs in A is idf = {i>i i—>■ y 3 + 

y 2 < 5, v 2 y 2 > 2, v 3 yi - y 2 > 0 }. 

Given a set A of CDCs and a valuation a of var p (II A ), 
we say that a CDC cp £ A is applicable under a if a makes 
the l.h.s. of prop {<j>) true. We can use a to "filter" II A and 
construct a set consisting of the consequent of each CDC 
in A that is applicable under a. This set contains the £- 
formulae that must be necessarily satisfied by the values 
at interpreted positions of every tuple consistent with a, 
that is, whose values at non-interpreted positions satisfy 
the antecedents of the CDCs applicable under a. 

Definition 4. Let A consist of CDCs over R, and let a be a 
valuation of y ar p (II A ). The a-filtering of II A is the set 

n A = { idf (u) I (P -m>) € n A , a(P) = T} , (7) 

consisting of C-formulae associated with propositional vari¬ 
ables that occur in some propositional fornmla of II A zvhose 
l.h.s. holds true under a. 

The main result of this section characterises the con¬ 
sistency of a set of CDCs in terms of satisfiability in £. 
We remark again that the result holds in general for any 
language £, not necessarily closed under negation. This 
requirement will become essential only in the upcoming 
Section 4 and Section 5. 

Theorem 1. Let A be a set of CDCs over R, and let II aux 
be the auxiliary theory for II A . Then, A is consistent if and 
only if there exists a valuation a of var p (II A ) satisfying L\ aux 
and such that 11 ^ is satisfiable. 

Whenever the satisfiability of sets of £-formulae is de¬ 
cidable, Theorem 1 gives an algorithm to check whether 
a set of CDCs is consistent, as we illustrate below in our 
running example, where £ is the language of UTVPIs. 

Example 3. With respect to II A of Example 2, consider 
the valuation a = {p% K»• T, p 3 K»• F }, for which we have 
n A = { 2 /i + y -2 < 5, j/i — y 2 > 0 }. Obviously, a satisfies 
the (empty) auxiliary theory II aux for II A . In addition, 
is satisfiable as, e.g., { 3/1 i—>- 3, r /2 1 —>• 2 } is a solution 
to every UTVPI in it. 

We will now give the proof of Theorem 1, for which 
we first need to prove a technical lemma. Let n = \R\ and 
k = ||i?||; with each tuple t is associated the assignment 
P : {yi ...., y n -k} idom, which we refer to as the as¬ 
signment induced by the interpreted positions oft, such that 
PiVi-k) = t[i] for every i £ {k + 1,..., n}. Intuitively, the 


following lemma shows that any tuple that is consistent 
with a valuation a satisfies a set of CDCs precisely if the 
assignment induced by its interpreted positions satisfies 
the a-filtering. 

Lemma 1. Let A be a set of CDCs over R, and let a be a 
valuation o/var p (II A ). Let t be consistent with a, and let ft 
be the assignment induced by the interpreted positions of t. 
Then, {!?(£)} \= A if and only if p satisfies 11^. 

Proof. Let n = \R\ and k = ||i?||. 

Claim 1. Let <j> £ A and prop(cp) = P —> v. Then, 
a(P) = T iff X(x) is true under {x\ i-> t[l], ... ,Xk f-> 
t[k]}. 

Proof. Since t is consistent with a, for i £ {1,..., k} 
we have that t[i] = a if and only if a(p“) = T. 

Claim 2. For each prop(<^) = P — >■ v with f £ A, it 
is the case that I \f 4> if and only if a(P) = T and 
P idf(u). 

Proof. As f is a CDC, / [A d if and only if the ante¬ 
cedent X(x) of f holds true under {xi i-> ?[1 ],..., Xk i-F 
t[k]} and the consequent 5(y) of (!) is not true under 
{yi ^ t[k + 1],... ,y n -k >-->• t[n]}. In turn, this is the 
case if and only if both a(P) = T (by Claim 1) and P 
does not satisfy idf(v) = Spy) (by construction). 

We prove Lemma 1 by showing that I \f A if and only 
if P does not satisfy II A . 

"if". Assume P II A , that is, there is some C-formula 
'ip £ not satisfied by p. By construction of 11^, if is the 
consequent of a CDC f £ A such that prop(</>) = P —> v, 
with ip = idf (v) and a(P) = T. Thus, as p does not satisfy 
ip, by Claim 2 I \f= (p, and therefore / [A A. 

"only if”. Assume I |A A. Then, there exists some cp £ 
A which is not satisfied by I. Since propfd) = P —> v, by 
Claim 2 a(P) = T and p \f= idf(xr). Hence, idf(xr) £ 11^. 
Therefore, p n A - □ 

Proof of Theorem 1. Let n = |i?| and k = ||i?||. 

"if". Let a and 8 be such that a |= n aux and P \= 11^- 
Then, as a \= II aux , it is never the case that two distinct 
propositional variables in var p (II A ) associated with the 
same position are both true under a. Thus, there exists 
a tuple t consistent with a and such that 8 is the assign¬ 
ment induced by its interpreted positions. Therefore, as 
P |= II A , the instance {I Ilf !)} is a model of A by Lemma 1. 

" only if”. Assume that A is consistent, that is, it has a 
non-empty model. In particular, as every formula in A 
is in one tuple, there is a tuple t such that the instance 
I = {i?(t)} is a model of A. Take a as follows: for every 
propositional variable p £ var p (II A ), a{p) = T if p = 
and t[i\ = a, otherwise a(p) = F. By construction, a \= 
n aux and 1 is consistent with a. Therefore, as / |= A, the 
assignment P induced by the interpreted positions of t 
satisfies 11A by Lemma 1. □ 

The satisfiability problem for £ takes as input a set T of 
£-formulae and answers the question: "Is T satisfiable?" 

Lemma 2 . The satisfiability problem for £ linearly reduces to 
the consistency problem for CDCs. 
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Proof. Let L = {fi,... ,<f> n } be a set (^-formulae. Then, 
take A = {T —> <fi \ fi £ T}, let IIa = {T —> Vi \ i = 
1 ,n} and idf = {v^ i—>■ <pi \ i = 1,..., n}. The auxiliary 
theory for IIa is n aux = 0. As var p (IlA) = 0, the only 
valuation of vary (IIa) is a = 0, which satisfies II aux and 
for which 11^ = { idf(t; i ) | Vi £ var v (IlA)} = T. Thus, 
by Theorem 1, the set A of CDCs is consistent iff T is 
satisfiable. The reduction is linear in the size of T. □ 

With regard to the consistency problem for CDCs 
whose consequents are either UTVPIs or BUTVPIs, we 
have the following complexity results. 

Theorem 2. When £ is the language of either B UTVPIs or 
UTVPIs , the consistency problem for CDCs is NP-complete. 

Proof. Constructing the propositional theories IIa and 
n aux requires linear time, checking that a valuation a of 
var p (IlA) satisfies II aux takes polynomial time, and check¬ 
ing that an assignment from the variables in y to integers 
satisfies 1 Vf (whose construction takes linear time) can be 
done in polynomial time, whether 11 (( consists of either 
UTVPIs or BUTVPIs. Hence, in light of Theorem 1, we 
can verify a given solution to the consistency problem, 
when £ is the language of either UTVPIs or BUTVPIs, 
in polynomial time. The NP-hardness of the consistency 
problem when £ is the language of BUTVPIs follows by 
Lemma 2 from the fact that the satisfiability problem for 
BUTVPIs is NP-hard. 

We will show that the consistency problem is NP-hard 
when £ is the language of UTVPIs by a reduction from 
SAT. Given an instance of SAT as a set <1» = { C \..... C'„} 
of clauses over (possibly negated) literals L 1 ,..., L k , we 
will construct a set A of CDCs (whose consequents are 
UTVPIs) that is consistent if and only if $ is satisfiable. 
To this end, consider a relation symbol R of arity k + 1, 
with the last position interpreted over the integers. With 
each literal Li we associate the equality Xi = a, and with 
each clause Cj we associate the CDC: 


1 

3 

ih 

< 

A 

1 

3 

ii 

< 

i_ 

Li^Cj 


-'Li^Cj 


Then, let A consist of T —> y\ < 0 and the CDCs of the 
form (8) associated with each clause. The propositional 
theory associated with A is 

n A = {T vi} U {Pj ->v 2 \l <j<n) , 
where v\ ^ y\ < Q, v 2 > Q and 

Pi = ( A -p?) a ( A p^) ■ ( 9 ) 

LiGCj -^LiCCj 

For every valuation a of var p (IlA), the a-filtering 11^ of 
IIa is either {yi < 0} or {y\ < 0,yi > 0}. Since the latter 
set is unsatisfiable, by Theorem 1 A is consistent if and 
only if there exists a valuation a such that, for every j £ 
{1,..., n}, Pj does not hold true under a. Clearly, to each 
valuation a of var p (IlA) there corresponds a valuation a' 
of Li ,..., Lk, and vice versa, such that off) = T if and 


only if a'(Li) = T; in turn, Pj is true under a if and only 
if Cj is false under a'. Thus, A is consistent if and only 
if $ is satisfiable. Therefore, since the given reduction is 
obviously polynomial, the claim follows. □ 

4 Lossless Selections under CDCs 

The technique described in the previous section can be 
opportunely extended and applied for checking whether 
a set of selection views of the form (3) is lossless under 
CDCs, that is, whether every source relation satisfying 
the given CDCs can be reconstructed by union from the 
fragments into which it is decomposed by the given view 
definitions. 

In this section, we will characterise lossless horizontal 
decomposition in terms of unsatisfiability in £, where 
£ is closed under negation. The losslessness problem in £ 
is the decision problem that takes as input a horizontal 
decomposition E specified by selections of the form (3) 
and a set A of CDCs and answers the question: "Is E 
lossless under A?" We will show that this problem is co- 
NP-complete when £ is the language of either UTVPIs 
or BUTVPIs. For these languages, our characterisation 
provides an exponential-time algorithm for deciding the 
losslessness of E under A, by means of a number of un¬ 
satisfiability checks in £ which is exponentially bounded 
by the size of A. 

By definition, a horizontal decomposition E of R into 
V\,...,V n is lossless under a set A of CDCs over R if 
R 1 =V i/ U • • • U Vn for every model I of A U E. As the 
extension of each view symbol is always included in the 
extension of R, the problem is equivalent to checking 
that there is no model I of A U E where a tuple t £ R 1 
does not belong to any Vf. In turn, this means that for 
each definition in E, which has the form (3), the values 
in t at non-interpreted positions do not satisfy A, or the 
values in t at interpreted positions do not satisfy the £- 
formula a. 

The formulae in E apply to one tuple at a time and, 
as already observed in Section 3, so do CDCs; therefore 
we can again focus on single tuples. With each equality 
we associate, as before, a propositional variable whose 
truth-value determines whether the equality is satisfied. 
Given a valuation a, we consider the set consisting of £- 
formulae in the r.h.s. of all the CDCs that are applicable 
under a and the negation of the selection condition 5(y) 
of each view definition in E whose selection condition 
X(x) is satisfied by a. Then, checking losslessness is equi¬ 
valent to checking that there exists no valuation a for 
which the above set of C-formulae is satisfiable. Indeed, 
from such a valuation and the corresponding assignment 
of values from idom satisfying the relevant £-formulae, 
we can obtain a tuple that provides a counterexample to 
losslessness. 

Similarly to what we did in Section 3 for sets of CDCs, 
we build a propositional theory associated with a given 
horizontal decomposition. 
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Definition 5. Let E = {</>i,..., </>„} be a horizontal decom¬ 
position. For each fa G E, which has the form (3), we build 

propOi) = P -F v\ , (10) 

in which v[ is either a fresh propositional variable associated 
(by means of idf) with the C-formida a(y), if any, occurring 
in fa, or _L otherwise . 7 We denote { prop(</>) | f G E } by IIs 
and we call it the propositional theory associated with E. 

We consider the set var(ll>J of propositional variables 
occurring in IIs partitioned into var p (l I E ) = {var(P) | 
(P —F Vi) G n s } and var v (II s ) = var(IIs) \ var p (II E ). 

Given a set A of CDCs over R and a horizontal de¬ 
composition E of R, the propositional theory associated 
with A U S is II = IIa U IJ>j, where IIa and Ids are the 
propositional theories of Definition 2 and Definition 5 
associated with A and E, respectively The set var(II) = 
var(IlA) Uvar('ll>;) of propositional variables occurring in 
II is partitioned into var p (II) = var p (IlA) U var p (II>j) and 
var v (II) = var v (IlA) U var v (n s ). 

Definition 6. Let A be a set of CDCs over R and let E be a 
horizontal decomposition of R. The auxiliary theory II aux for 
n = n A U Ids consists of the propositional formulae in Ids 
zvhose r.h.s. is _L and the axioms of unique value for var p (dl). 

Observe that the above is a proper extension of Defin¬ 
ition 3: whenever E is empty, the auxiliary theory for Id 
coincides with the auxiliary theory for TIa. 

Example 4. The propositional theory associated with E 
of Example 1 is Ids = {^P% Ap| — F _L, T — F v 2 , ~^p\ —F _L}. 
Let Id = IIa U Ids, where IIa is the propositional theory 
already given in Example 2. The association between the 
propositional variables in var v (ld) and UTVPIs is idf of 
Example 2 extended with v 2 HF y 2 < 4, and the auxiliary 
theory for II is ld aux = {- 1 p 2 A p\ —F 1, —F _L}. 

Definition 7. Let E be a horizontal decomposition, and let a 
be a valuation of var p (IIs). The a-filtering of Ids is the set 

ng = {-■ idf(?/) | (p -f v') g n E , 

a(P) = T, v'^±} , ( } 

consisting of the negation of ^-formulae associated with pro- 
positional variables that occur in some propositional formula 
of Ids zvhose l.h.s. holds true under a. 

Observe that in (11), differently from (7), ^-formulae 
are negated. This is because a counter-instance I to 
losslessness is such that V\ U • • • U Vn = 0 and R 1 has 
only one tuple; therefore, whenever the formula A (a;) in 
the selection that defines a view symbol is satisfied by 
I, the G-formula S(y), if any, is not. On the other hand, 
the G-formula in the consequent of a CDC must hold 
whenever the condition in the antecedent is satisfied. 

For a valuation a of var p (II), the a-filtering of II is the 
set 11“ = I U 11(1, which, as € is closed under negation, 
consists of G-formulae. 

7. This is because the constraints in £ may not specify a G-formula. 


The main result of this section is the following charac¬ 
terisation of lossless horizontal decomposition in terms 
of unsatisfiability in G. 

Theorem 3. Let E be a horizontal decomposition of R, let A 
be a set of CDCs over R, and let H aux be the auxiliary theory 
for II = IIa Ulds. Then, E is lossless under A if and only if 
the a-filtering 11“ = U Ilf. of II is unsatisfiable for every 
valuation a o/var p (II) satisfying Ll mix . 

Whenever the satisfiability of G-formulae is decidable. 
Theorem 3 provides an algorithm for deciding whether a 
given horizontal decomposition is lossless. We illustrate 
this in our running example with UTVPIs. 

Example 5. Consider II and II aux from Example 4. The 
only valuation of var p (II) satisfying II aux is a = {p 2 HF 
T, p\ 1 —F T}, for which the a-filtering of II is 

n a = { yi + 2/2 < 5, y 2 > 2, yi - y 2 > 0 } U { y 2 > 4 } . 

S -v-•' v -v-' 

T^a TI§ 

Note that y^ > 4 in IIA is -1 idffi^), that is, the negation 
of 2/2 < 4. The set 11“ = Ilf UIlf is unsatisfiable because 
from y 1 + 2/2 < 5 and y 2 > 4 we get y-\ < 1, which together 
with yi — 2/2 > 0 yields 1/2 < 1 / in conflict with z /2 > 2 . 
So, the horizontal decomposition E is lossless under A . 8 

We will now give the proof of ddieorem 3, for which 
we first need to prove two additional lemmas. In the 
following, and in the rest of the article, let f denote the 
formula Vx, y . R(x , y) -gf \J Vev V(x, y), and recall that 
a horizontal decomposition E is lossless under A if and 
only if A U E |= <p. We start by showing that, when A 
consists of CDCs, A U E does not entail f precisely if 
there is a counterexample to it with only one tuple. 

Lemma 3. Let E be a horizontal decomposition of R and let 
A be a set of CDCs over R. Then, A U E Y= f> if and only if 
there exists a tuple t such that the instance I = {P(t)} is a 
model of A U E. 

Proof. The "if" is trivial. For the "only if", assume that 
A U E P- Then, there exists a model J of A U E such 
that J \f= ip, that is, R J V\ U • • • U V n J . The extension 
of each V, always contains a subset of the tuples in the 
extension of R under every instance, hence there must 
be t G R J such that t ^ Vp for every i G {1,..., n }. Let 
I = { R(t )}; as every constraint in A U E is in one tuple 
and J |= A U E, we have that I \= A U E. f} 

The next lemma is more technical: intuitively, it shows 
that any tuple that is consistent with a valuation a satis¬ 
fying the auxiliary theory provides a counterexample to 
losslessness if and only if the assignment induced by its 
interpreted positions satisfies the a-filtering. 

Lemma 4. Let E be a horizontal decomposition of R, let A 
be a set of CDCs over R, and let n„„ x be the auxiliary theory 

8. In the scenario of our running example it would makes sense to 
require salaries and bonuses to be non-negative quantities, which can 
be done by consistently adding the CDCs T —F yi > 0 and T —F yi > 0 
without affecting the losslessness of the decomposition. 
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for II = IIa U IIs. Let a be a valuation of var p (n), let the a 
tuple consistent with a, and let P be the assignment induced 
by the interpreted positions oft. Whenever a |= H aux , we have 
that {R(t)} |= A U E if and only if f} satisfies 11“. 

Proof. Let n = |f?| and k = ||i?||. 

Claim 1. Let prop(p) = P —» v', with <f> £ E. Then, 
I \f= (f> iff a(P) = T and, whenever v' ^ _L, /3 |= idf(i/). 

Proof. Since <f> £ E has the form (3), I \f= f iff X(x) is 
true under {x\ i-t t[l],..., Xk i-t t[k]} and cr(y), if any, 
holds true under {y\ ^ t[k + 1],... ,y n -k t[n]}. As 
v' -L iff f contains a £-formula a(jy) = idf(t/'), the 
claim follows by construction of a and p. 

Assume a \= II aux . We will show that / |A A LJ E if and 
only if p does not satisfy 11 “. 

"if". Assume P [A 11“. Then, there is a £-formula if £ 
n“ that is not satisfied by /3. By construction of II“, either 
if or its negation appear in some <f £ A U E, depending 
on whether <f £ A or cf> £ E, respectively. If <f> £ A, then 
prop(f) = P —> v with a{P) = T, so I \f= p (by Claim 2 in 
the proof of Lemma 1). If <f> £ E, then prop(f) = P —> v' 
with v' A A and a(P) = T, hence I \f= tj> by Claim 1. In 
either case I^AuE. 

"only if". Assume I |A A U E. Then, there is some cj> £ 
A U E that is not satisfied by I. If <f> is in A, by Lemma 1 
P \/L hence p {A 11“. If f is in E, prop(0) = P —» v'; 
as I \f= (f>, by Claim 1 a(P) = T and v' A -L implies P |= 
idf(u). Suppose v' = _L, then prop(</>) is in II aux and, since 
a |= n aux , we obtain a(P) = F, which is a contradiction. 
So, v' A -L arid P \= idf(U). In turn, we have that P \f= 
—' idf(ti') and -> idf('iZ) £ 11“. Therefore, P \f=- 11“. □ 

Proof of Theorem 3. Let n = |P| and k = ||P||. We will 
show that A U E |A tp if and only if there exist a and P 
satisfying II aux and II“, respectively. 

"if". Let a and P be such that a \= II aux and p \= II“. 
Since a \= II aux , no two distinct propositional variables in 
var p (II) associated with the same position are both true 
under a. Hence, there is a tuple t consistent with a and 
such that P is the assignment induced by its interpreted 
positions. So, the instance I = |P(t)} is a model of AuE 
by Lemma 4. Thus, as I ip, A U E |A by Lemma 3. 

"only if”. Assume that A U E |A p. By Lemma 3, there 
exists a tuple t such that / = { li(t)} is a model of A U E. 
Let p be the assignment induced by the interpreted 
positions of f, and let a be the valuation such that, for 
each p £ var p (II), a{p) = T if p = pf and t[i] = a, and 
a{p) - F otherwise. We will show that a satisfies II aux 
and, in turn, p \= 11“ by Lemma 4, since / |= A U E. By 
construction, a satisfies every propositional formula in 
n aux of the form Ap- -H 1 with p°;,p\ £ var p (II) and 
Pi / Pi- All other propositional formulae in II aux have 
the form prop(</>) = P —» _L, where f is a constraint in E 
that does not contain a ^-formula (j(y). As / |= A U E, 
the condition X(x) in each such </> £ E is not true under 
{x\ i-t f[l],... ,Xk i-t t[k]}. Therefore, prop (f) = P —> _L 
is true under a as a(P) = F by construction of a. □ 

The unsatisfiability problem for £ is the complement 


of the satisfiability problem for £. 

Lemma 5. The unsatisfiability problem for € linearly reduces 
to the losslessness problem in £. 

Proof. Let T = {pi...., p,,} be a set ^-formulae. We will 
show how to construct a horizontal decomposition that 
is lossless under A = 0 precisely if T is unsatisfiable. To 
this end, take E = { Vi : -xfi \ pi £ T } and observe that, 
as £ is closed under negation, —>0^ £ £. Thus, E consists 
of selections of the form (3), where a = ^p, and A = T. 
Therefore, E is indeed a horizontal decomposition. 

Let II = IIa U IIe = 0 U { v\ \ i = 1, ..., n } for which 
idf = {'if i-> -i pi | <fa £ T }. Then, the auxiliary theory 
for II is n aux = 0. Since var p (II) = 0, the only valuation 
of var p (II) is a = 0, which satisfies II aux and for which 
n“ = {-iidf(v') | v[ £ var v (II s )} = T. Therefore, by 
Theorem 3, S is lossless under A = 0 if and only if T is 
unsatisfiable. The reduction is linear in the size of T. □ 

With regard to the losslessness problem in the lan¬ 
guages of UTVPIs and BUTVPIs, we have the following 
complexity results. 

Theorem 4. When £ is the language of either BUTVPIs or 
UTVPIs, the losslessness problem in £ is co-NP-complete. 

Proof. Constructing the propositional theories II and 
n aux takes linear time, checking whether a valuation a 
of var p (II) satisfies n aux requires polynomial time, and 
checking that an assignment of integers to the variables 
in y satisfies 11 “ (whose construction takes linear time) 
can be done in polynomial time, whether 11 “ consists 
of UTVPIs or BUTVPIs. Hence, in light of Theorem 3, 
we can verify a given solution to the complement of the 
losslessness problem, in either language, in polynomial 
time. Therefore, the losslessness problem is in co-NP in 
both cases. 

The co-NP-hardness in the case of BUTVPIs follows 
by Lemma 5 from the fact that the satisfiability problem 
for BUTVPI-formulae is NP-hard and so its complement 
is, in turn, CO-NP-hard. 

We will show the co-NP-hardness of the losslessness 
problem when £ is the language of UTVPIs by a reduc¬ 
tion from UNSAT. The reduction is quite similar to the 
one given in the proof of Theorem 2 for showing the NP- 
hardness of the consistency problem for CDCs when £ is 
the language of UTVPIs. Given a set $ = {Ci,..., C n } of 
clauses over possibly negated literals L; i: , we will 

build a set A of CDCs (whose consequents are UTVPIs) 
and a horizontal decomposition E (where the selection 
conditions on the interpreted positions are UTVPIs) such 
that E is lossless under A if and only if l f> is unsatisfiable. 
To this end, consider a source relation symbol R of arity 
k + 1 , with the last position interpreted over the integers, 
and the view symbol V. With each literal A, we associate 
the equality Xi = a, and with each clause Cj we associate 
the CDC ( 8 ). Let A consist of the CDCs of the form ( 8 ) 
associated with each clause, and let E be the horizontal 
decomposition specified by V: yi > 0. The propositional 
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theory associated with A U E is 

II = {Pj —> v | 1 < j < n} U {T —» v'} , 

with ir i—>■ 2 /i_ > 0, v' i—>• 3/1 > 0 and Pj as in (9). For every 
valuation a of var p (n), the a-filtering II a of II is either 
{yi > 0, y\ < 0} or {yi < 0}, depending on whether A 
contains a CDC that is applicable under a. As the latter 
set is satisfiable, by Theorem 3 we get that E is lossless 
under A if and only if for every valuation a there exists 
j £ { 1 ,.... n } such that Pj holds true under a. Clearly, 
each valuation a of var p (II) corresponds to a valuation 
a' of Lk, and vice versa, such that a(pf) = T if 

and only if a'(Li) = T; in turn, Pj is true under a if and 
only if Cj is false under a'. Thus, E is lossless under A if 
and only if $ is unsatisfiable. Therefore, since the given 
reduction is obviously polynomial, the claim follows. □ 

5 Adding FDs and UINDs 

So far, we have considered lossless horizontal decompo¬ 
sition under CDCs in isolation; in this section, we extend 
our study to the case in which the integrity constraints 
over the source schema are combinations of CDCs with 
more traditional database constraints. This investigation 
is vital to understand whether, how and to what extent 
the techniques we described in Section 4 can be applied 
to existing database schemas on which a set of integrity 
constraints other than CDCs is already defined. 

Here, we focus on two well-known classes of integrity 
constraints, namely functional dependencies (FDs) and 
unary inclusion dependencies (UINDs) [11]. Under cer¬ 
tain restrictions - as we shall see - their interaction with 
CDCs can be fully captured, w.r.t. lossless horizontal de¬ 
composition, in terms of CDCs. It is important to remark 
that we consider restrictions solely on the CDCs, so that 
existing integrity constraints need not be modified in any 
way in order to allow for CDCs. 

Let us recall that an instance I satisfies a UIND R[i] C 
R[j\ if every value in the i-th column of R 1 appears in the 
j-th column of R 1 . The following example shows that, if 
we allow CDCs together with constraints from another 
class, such as UINDs, their interaction may influence the 
losslessness of horizontal decomposition. 

Example 6 . Let R and V be relation symbols of arity 2, 
whose positions are interpreted over the integers. Let E 
be the horizontal decomposition defined by V: y± >3, 
and let A be a set of integrity constraints on R consisting 
of the CDC T —> y 2 > 3 and the UIND I?[l] C R[ 2], It 
is easy to see that A entails T —> y\ > 3. Therefore, E is 
lossless under A because V selects all of the tuples in R, 
which is clearly not the case without the UIND. 

We now introduce a general property, separability, that 
will constitute the main technical tool for the subsequent 
analysis of combinations of CDCs with FDs and UINDs. 
Informally, a class of constraints is separable from CDCs 
if, after making explicit the result of their interaction, 
which is captured by a suitable set of inference rules, we 


can disregard constraints from that class and focus solely 
on CDCs, as far as lossless horizontal decomposition is 
concerned. 

In what follows, for a set A of constraints we denote 
by cdc(A) the maximal subset of A consisting solely of 
CDCs. 

Definition 8 (Separability). Let C be a class of integrity 
constraints, let S be a finite set of sound inference rides 9 for 
C extended with CDCs, and let A consist of CDCs and C- 
constraints. We say that the C-constraints are <S-separable in 
A from the CDCs if every horizontal decomposition is lossless 
under A exactly when it is lossless under cdc(A*), where A* 
denotes the S-closure of A . 10 We say that the C-constraints are 
separable if there is some S for which they are S-separable. 

Thus, to check whether a horizontal decomposition E 
is lossless under an <S-separable combination A of CDCs 
and other constraints, one can proceed as follows: 

1) compute the deductive closure A* of A w.r.t. S, which 
makes explicit the interaction between CDCs and the 
other constraints in A by adding entailed constraints; 

2) by using the technique of Section 4, check whether E 
is lossless under cdc(A*), that is, the set obtained by 
discarding from A* all of the constraints that are not 
CDCs. 

Observe that 5-separability implies ^'-separability for 
every sound S' D S. 

5.1 Functional Dependencies 

We begin our investigation of separability by showing 
that FDs do not interact with CDCs and so, as far as the 
losslessness of horizontal decomposition is concerned, 
they can be freely allowed in combination with them. 

Theorem 5. Let Abe a set of CDCs and FDs. Then, the FDs 
are 0 -separable in A from the CDCs. 

Proof. We will prove that a horizontal decomposition is 
lossless under A if and only if it is lossless under cdc(A). 

"if". We have that cdc(A) C A and, in turn, A entails 
cdc(A); therefore cdc(A) |= p implies A |= p. 11 

“only if". Whenever a horizontal decomposition is not 
lossless, by Lemma 3 there is a witness instance I with 
only one tuple. Since the violation of an FD involves at 
least two tuples, I satisfies all of the FDs in A. 12 □ 

5.2 Unary Inclusion Dependencies 

Since in general it is not possible to compare values from 
dom with values from idom, we consider only UINDs 
of the form i?[z] C R[j] where positions i and j are either 
both non-interpreted or both interpreted. We refer to the 

9. We assume the reader to be familiar with the standard notions 
(from proof theory) of inference rule, soundness, deductive closure. 

10. As the constraints that are not CDCs are in any case filtered out 
from A*, it does not matter whether C extended with CDCs is closed 
under S or not. 

11. Recall that tp = Vx,y . R(x,y) ** V?=i Vi(x,y). 

12. As a matter of fact, it satisfies any FD. 
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UINDs in the former case as X-UINDs and in the latter 
as Y-UINDs. Let n = |f?| and k = ||i?||; we write R[xi\ C 
R[xj] with i,j € {1,..., k} to denote the X-UIND R[i] C 
R[j] and we write R[yi] C R[y : j\ with i : j £ {1,..., n — k} 
to denote the Y-UIND R[i + k] C R[j + k\. 

UINDs on Interpreted Attributes 

First, we study the interaction between Y-UINDs (that is, 
UINDs at interpreted positions) and a restricted form of 
CDCs, which we shall introduce shortly. This interaction 
is captured by the following domain propagation rule: 

T Hyd _ R [yj] c R[yi] ( , , 

T -> 5{ Vj ) ’ 1 Pj 

whose soundness is easily shown below. 

Theorem 6 . Let A be a set of CDCs and UINDs. If A \= 
\/x,y .R(x,y) -t S(yi) and A |= R[yj] C R[ yi ], then A \= 
Vx,y.R(x,y) —F $(%•). 

Proof. If A is inconsistent, the claim follows trivially. 
Thus, let / be a model of A; hence / satisfies the CDC 

Vx,y . R(x,y) —> 8 {yi) and the UIND R[yf\ C R[yt]. If 
R 1 = 0, then trivially I \= Vx, y . R(x,y) —> 6 (yj). So, 
let R 1 f 0 and suppose I Vx,y . R(x,y) —t 8 (yj). 
Then, there exists t £ R 1 for which S(t[j + fc]) holds true, 
with k = ||f?||. By the UIND, there must be t' £ R 1 such 
that t'[i + k] = t[j + k\. Hence S(t [i + fc]) is not true, in 
contradiction of I \= Yx,y . R(x,y) —> 5(yi). □ 

It turns out that when all of the CDCs that mention 
a variable y corresponding to an interpreted position 
affected by some Y-UIND have the form T —> 5(y), the 
domain propagation rule fully captures the interaction 
between such CDCs and Y-UINDs w.r.t. losslessness. 

Definition 9. We say that a set A of CDCs and Y-UINDs 
is dp-controllable if for every Y-UIND R[yf\ C R[yj] in A 
with i f j, all of the CDCs in A mentioning the variable y, 
where y is yi or yy, are of the form T —> 5(y). 

Theorem 7. Let A be a dp-controllable set of CDCs and Y- 
UINDs. Then, the Y-UINDs are {(dp)}-separable in A from 
the CDCs. □ 

The above theorem is a special case of a more general 
result (Theorem 10) given later on. 

Even though in general dp-controllability is not a ne¬ 
cessary condition for the {(dp)}-separability of Y-UINDs 
from CDCs, the following examples show two different 
situations where, in the absence of dp-controllability, the 
Y-UINDs are not {(dp)}-separable from the CDCs. 

Example 7. Let R be a ternary relation symbol, whose 
last two positions are interpreted over the integers. Let 
A consist of the Y-UIND R[yi\ C R\yf and of the CDCs 
X\ = a —> y 2 > 2 , Xi f a —t y\ > 0 , x\ ^ a —t y\ < 0 , and 
consider the view symbol V: y\ > 1. For x-\ f a there is 
no suitable value for y-[ to satisfy the above CDCs, thus 
every model I of A is such that, for every t £ R 1 , t[l] = a 
and f [3] > 2. Moreover, by the Y-UIND R[yi] C i?[y 2 ], we 


also have that t[ 2] > 2, and therefore every tuple in R 1 
is also in V 1 , which means that V is lossless under A. 
Clearly, this is not the case in the absence of the Y-UIND, 
that is, under cdc(A). Let A* be the {(dp)}-closure of A. 
Then, as A* = A, we have that V is lossy under cdc(A*) 
and, therefore, the Y-UIND is not {(dp)}-separable in A 
from the CDCs. 

Example 8. Let R be a relation symbol of arity 4 and 
with all of its positions interpreted over the integers. 
Consider the view symbol V: j / 3 < 3 A y^ > 4, and let 
A consist of the Y-UIND R[yi] C If yf and the CDCs 
T -t yi + 2/3 > 0, T — 12/2 + 2/4 < 0 , T -t y 3 - 2/4 < 0. The 
above CDCs entail T —> yi — 1/2 > 1, thus in every model 
I of A each tuple t £ R 1 must be such that t[l] — 1\2] > 1. 

By the Y-UIND R[yi] C i?[y 2 ], for each d in tt 2 (R 1 ) 13 
there exists d' £ 7 r 2 (i? / ) with d' > d + 1. Then, as d' d, 
the instance I is either infinite or empty. Hence, every 
horizontal decomposition is lossless under A. 

On the other hand, let A* be the {(dp)}-closure of A 
and observe that A* = A. Let J = {i?(l, 0,0,0)}; then, 
since J |= cdc(A*), V is lossy under cdc(A*). Therefore, 
the Y-UIND is not {(dp)}-separable in A from the CDCs. 

UINDs on Non-lnterpreted Attributes 
We now turn our attention to combinations of CDCs and 
X-UINDs (i.e., UINDs at non-interpreted positions). First, 
we show that the syntactic restrictions introduced in [17] 
on the CDCs are not sufficient for the 0 -separability of 
X-UINDs. Indeed, the following is a counterexample to 
Theorem 7 of [17]. 

Example 9. Let R be a ternary relation symbol, with the 
third position interpreted over the integers. Let A consist 
of the CDC X 2 = a —> yi < 0 A y\ > 0 and the X-UIND 
i?[l] C i?[2]. The CDCs in A are trivially non-overlapping 
with the UINDs [17] and partition-free [17]. Consider the 
horizontal decomposition S specified by the selections 
V\: xi a, V 2 : X 2 b and V 3 : y-\ 7 ^ 0. Observe that 
every tuple other than (a, 6 , 0 ) is captured by at least 
one of the above selections. Let I = {R(a, b, 0)}; clearly, 
I |= cdc(A) U S but I \f= hence E is not lossless under 
cdc(A). However, E is lossless under A as every model 
of A U E also satisfies <p. This is due to the fact that there 
exists no instance J such that J \= A and R(a. b 1 0) £ J. 
Indeed, to satisfy the UIND i?[l] C R[ 2], such an instance 
J must also contain a tuple t £ R J with / [2] = a which, 
on the other hand, does not satisfy the CDC x 2 = a —t 
yi < 0 /\ y\ >0. Hence, the X-UINDs are not 0 -separable 
in A from the CDCs. 

Below, we introduce a restriction on the CDCs, which 
ensures the 0 -separability of the X-UINDs. 

Definition 10. A set A of CDCs is globally consistent if 
for every \\R\\-tuple t x of dom constants, there is a tuple t y of 
|f?| — ||i?|| values from idom such that the instance {f?(t)}, 
with t = (t x ,t v ), is a model of A. 

13. 7 rj denotes projection on the i-th position. 
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Note that cdc(A) in Example 9 is not globally consistent. 

Theorem 8. Let A consist of CDCs and X-UINDs such 
that cdc(A) is globally consistent. Then, the X-UINDs are 
0 -separable in A from the CDCs. □ 

The above theorem, like Theorem 7, is a special case 
of a more general result (Theorem 10) given later on. 

It is possible to check for the global consistency of a 
set A of CDCs in a way similar to the one described in 
Theorem 1 for consistency, by building the propositional 
theory 11 a associated with A, along with the auxiliary 
theory II aux for IIa, and then checking that the a-filtering 
of ITa is satisfiable for every (rather than just for one) 
valuation a of varp(IlA) that satisfies II aux . Indeed, under 
the assumptions of Theorem 1, A is globally consistent 
if and only if 11^ is satisfiable for every valuation a of 
var p (]TA) satisfying II aux . Checking for global consistency 
is expensive, because it requires an exponential number 
of satisfiability checks in £; the associated decision prob¬ 
lem is in PS PACE (the space used for one satisfiability 
check can be reused for the next) for UTVPIs as well as 
for BUTVPIs. 

Devising purely syntactic restrictions that guarantee 
the global consistency of CDCs depends on the specific 
constraint language € in use, which is indeed what we 
overlooked in [17]. As it turns out, the non-overlapping 
and partition-free restrictions of [17] ensure global con¬ 
sistency (and so also the 0-separability of X-UINDs) only 
for sets of CDCs whose consequents are UTVPIs. This is 
not the case anymore for CDCs whose consequents are 
BUTVPIs, which indeed allow to express Example 9. 

We provide a condition that, although not guarantee¬ 
ing global consistency, ensures the 0-separability of the 
X-UINDs. Moreover, this restriction can be checked more 
efficiently than global consistency, as it requires only a 
polynomial number of C-satisfiability checks. 

Definition 11. Let A be a set of CDCs. We say that the 
CDCs in A are disjoint w.r.t. an X-UIND J?[a;,] C R[xf] if 
for any tivo distinct CDCs 14 <p\(x\,yf) and 0 2 (£2, y 2 ) in A, 
with Xj in x\, the consequent of <p\ is satisfiable and has no 
variables in common with the consequent of <f> 2 . 

Intuitively, the above requires that all of the variables 
appearing in the consequent of any CDC <j> whose ante¬ 
cedent mentions the variable Xj affected by an X-UIND 
R[xf\ C R[xf do not occur in the consequent of any other 
CDC; moreover, the consequent of each such cj> must be 
satisfiable. 

Theorem 9. Let A be a set of CDCs and X-UINDs, where 
the CDCs are disjoint w.r.t. each X-UIND in A. Then, the 
X-UINDs are 0-separable in A from the CDCs. □ 

The above theorem is a special case of a more general 
result (Theorem 11) given later on in Section 5.3. 

Clearly, as global consistency is a property of the CDCs 
in isolation, whereas disjointness is relative to a X-UIND, 

14. Mistakenly, in [23] the CDCs were not required to be distinct. 


these two notions are incomparable, in the sense that one 
does not imply the other and vice versa, as shown below. 

Example 10. Let R be a ternary relation symbol, whose 
third position is interpreted over the integers, and let 
ip be the X-UIND f?[l] C R[2). The set Aj consisting of 
the CDCs Xi = a —t y\ < 0 and X\ = a — > y\ > 0 is not 
globally consistent, as there is no suitable value for the 
third position (associated with yf) whenever the value 
of the first (associated with xf) is a; however, the CDCs 
in Ai are disjoint w.r.t. ip, since neither CDC mentions 
the variable cc 2 affected by ip. On the other hand, the set 
A 2 consisting of x\ = a —> y\ > 0 and x 2 = a —> y\ > 1 is 
globally consistent, but it is not disjoint w.r.t. 0, because 
the second CDC mentions x-> in its antecedent, and the 
variable y-\ mentioned in its consequent also appears in 
the consequent of the first CDC. 15 

UINDs on All Attributes 

We now study the separability of UINDs (i.e., X-UINDs 
and Y-UINDs together) 16 from CDCs. The following is a 
generalisation of both Theorem 7 and Theorem 8. 

Theorem 10. Let A be a set of globally consistent CDCs, 
X-UINDs and Y-UINDs, such that the CDCs and Y-UINDs 
are dp-controllable. Then, the UINDs are {(dp )}-separable in 
A from the CDCs. 

To give the proof of the above theorem, we will need 
to prove several lemmas, showing how any given model 
of the (saturated set of) CDCs can be extended in order 
to satisfy the UINDs as well. 

Lemma 6. Let A be a dp-controllable set of CDCs and Y- 
UINDs, and let t be a tuple such that {i?(t)| |= cdc(A*), 
where A* is the {(dp )}-closure of A. Let ip = R[i] C R\j] 
be a Y-UIND in A, and let t' be identical to t except for 
t'\j] = t[i\. Then, {f?(t , ){ |= cdc(A*) U {ip}. 

Proof. Let A' = cdc(A*). Since all of the UINDs in A are 
Y-UINDs, i.j > k with k = ||-R||. As t satisfies A' and t 
differs from t only on the j-th element, t satisfies every 
CDC in A' not mentioning the variable The only 
CDCs in A' which are allowed to mention y : j-k have the 
form T —> 5(yj_k)- For each such CDC, since R[i ] q m 
is in A', by (dp) also T — > <5(t/j_ fe ) is in A'. Hence <5(t[z]) 
holds true, and in turn 5(t [j]) is true as well, because 
t [j] = t[i\. Therefore, t satisfies all the CDCs of the form 
T —)• 6 (z/j-fc). Moreover, t trivially satisfies the UIND ip, 
as t'[i] = t'[j] = t[i\. □ 

Lemma 7. Let A be a dp-controllable set of CDCs and Y- 
UINDs, and let I be a model of cdc(A*), where A* is the 
{(dp)}-closure of A. Then, there exists an instance J A I 
such that J \= A*. 

15. The example given in [23] is incorrect. 

16. Recall that UINDs between non-interpreted and interpreted po¬ 
sitions are not allowed, as they make little sense. 
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Proof. Let Jo = /. We will iteratively add tuples to Jo so 
as to obtain a model of A*. At each iteration k, proceed 
as follows: 

1) Find a violation of some Y-UIND R[i] C R[j] in A*, 
that is, a value d £ TVi(R Jk ) which is not in nj(R Jk ). 

2) Take t £ R Jk such that t[«] = d and t[j] -f d. 

3) Let Jk+i = Jfc U {R(t')}, with t' identical to t except 
for t'\j) = d. 

In the worst case, to satisfy all the Y-UINDs, for every 
pair of interpreted positions p and q the above procedure 
will have to make the projections on p and q equal. This 
is possible because, after each iteration k, adorn (Jfc + j) = 
adom( Jfc), 17 as t does not introduce new constants from 
dom or idom. In such a worst-case scenario, for each 
tuple t £ R ] and every interpreted position p, the value 
t[p] will be copied to every interpreted position other 
than p, resulting in the insertion of r — 1 new tuples, with 
r = |i?| — |/i’ll (i.e., the number of interpreted positions 
in R). The total number of tuples added to I equals at 
most m ■ r ■ (r — 1), where to is the number of tuples in 
I, and therefore the procedure terminates after finitely 
many steps, yielding an instance J3 / that satisfies all 
the Y-UINDs in A* by construction. 

Let A' = cdc(A*). To conclude the proof, we show by 
induction that J satisfies A'. The base case is Jo \= A'. 
Observe that {R(t)} |= A', because A' consists of CDCs. 
Then, assuming J k |= A', we have that Jk+i |= A', since 
{R(t')} \= A' by Lemma 6. □ 

Lemma 8. Let A consist ofX-UINDs and globally consistent 
CDCs, and let I be a model of c dc(A). Then, there exists an 
instance J snch that J A I and J \= A. 

Proof. Let A' = cdc(A) and J 0 = I. We will show how 
to build a model J31 of A by iteratively adding tuples 
to Jo- At each iteration k, proceed as follows: 

1) Find a violation of some X-UIND f?[i] C R[j] in A, i.e., 
a dom constant a £ iii(R Jk ) which is not in TTj(R Jk ). 

2) Take t £ R Jk such that t[i] = a and t[j] f a. 

3) Let Jfc+i = J k U{R(f )}, where t agrees with t at non- 
interpreted positions except for t, [j] = a. Suitable val¬ 
ues of t' at interpreted positions exist by Definition 10 
as A' is globally consistent. 

In the worst case, to satisfy all the X-UINDs, for each pair 
of non-interpreted positions p and q the above procedure 
will have to make the projections on p and q equal. This 
is possible because, after each iteration k, adom(J/ i:+ i) n 
dom = adom( Jfc) ITdom, as t does not contain any new 
constant from dom (though it may contain new values 
from idom). In this worst-case scenario, for each tuple 
t £ I and every non-interpreted position p, the value t\p\ 
will be copied to every non-interpreted position other 
than p, resulting in the insertion of | R\\ new tuples. The 
total number of tuples added to I is at most equal to 
to • \\R\\ • (||f?| — 1), where m is the number of tuples in 
I, and therefore the procedure terminates after finitely 

17. As a matter of fact, adom( Jk+i) n idom = adorn (Jk) n idom 
suffices. 


many steps, yielding an instance J D / that satisfies all 
the X-UINDs in A by construction. 

To conclude the proof, we show by induction that J 
is a model A'. The base case is J 0 |= A'. Observe that 
{f?(i)} |= A', since A' consists of CDCs. Then, assuming 
Jfc |= A', we have that J k +1 \= A' as {f?^)} |= A' by the 
global consistency of A'. □ 

Proof of Theorem 10. Let A* be the closure of A under 
{(dp)}, and let A' = cdc(A*). Observe that A* = A, as 
(dp) is sound by Theorem 6. According to Definition 8, 
we will show that AuS |= tp if and only if A' U E |= Ip. 

"if". As A' U E C A* U E, every model of A* U E is also 
a model of A' U E. Hence, A* U E |= A' U E and, since 
A* = A, in turn A U E |= A' U E. Therefore, A U E |= Ip 
whenever A' U E |= ip. 

“only if". By contraposition. Assume that A' U E ip. 
Then, as A' consists solely of CDCs, by Lemma 3 there is 
a tuple t such that I = { R(t )} satisfies A' UE. In turn, as 
A' is over R, I is also a model of A'. By Lemma 8, there 
exists an instance J' A / satisfying all of the X-UINDs 
in A* and, by Lemma 7, there exists J" A J' satisfying 
all of the Y-UINDs in A*. Moreover, by construction, for 
each tuple in J" there is a tuple in J' having the same 
values at non-interpreted positions, thus J" also satisfies 
all of the X-UINDs in A*. Therefore, J" is model of A* 
and, as A* = A, of A as well. Let J be the instance over 
R U V with R J = R J " (the extension of each V, under J 
is unambiguously determined by R J ). Clearly, J |= AuE 
but J Ip, because t £ R J while t ^ V\ U • • • U V n J . £j 

Observe that Theorems 7 and 8 are direct corollaries 
of Theorem 10. The proof of Theorem 7 is analogous to 
the one above, with the difference that in the "only if" 
direction, as A does not contain X-UINDs, Lemma 8 is 
not needed in order to build the instance J' (simply take 
J ' = I), and therefore the CDCs are not required to be 
globally consistent. The proof of Theorem 8 is also very 
similar to the one above, with the difference that, since A 
does not contain Y-UINDs, there is no need to compute 
A*, which in this case is always equal to A. Hence, we 
obtain 0-separability rather than {(dp)}-separability. The 
"if" direction works also with A* = A, which is indeed 
a special case, while for the "only if" direction one can 
simply take J" = J' (as Lemma 7 is not needed). 

Next, we show that replacing global consistency of the 
CDCs in the assumptions of Theorem 10 by disjointness 
w.r.t. the X-UINDs yields another sufficient condition for 
the {(dp)}-separability of the UINDs from the CDCs. 

Theorem 11. Let Abe a set of CDCs and UINDs such that 
the CDCs are disjoint zv.r.t. each X-UIND in A, and the 
CDCs and Y-UINDs are dp-controllable. Then, the UINDs 
are {(dp )}-separable in A from the CDCs. Q 

The proof of the above theorem is analogous to that of 
Theorem 10, with the difference that in the "only if" 
direction the existence of the instance J' is guaranteed 
by the following lemma rather than Lemma 8. 
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Lemma 9. Let A be a set of X-UINDs and CDCs such that 
the CDCs are disjoint zv.r.t. each X-UIND in A, and let I be 
a model of cdc(A). Then, there exists an instance J such that 
J D I and J f= A. 

Proof. The construction of J is the same as in Lemma 8, 
with the only difference that, in step 3 of the procedure, 
the existence of suitable values for t' at interpreted posi¬ 
tions is guaranteed by the disjointness of the CDCs w.r.t. 
the X-UINDs, as shown below. 

We say that a CDC applies on an instance, if that in¬ 
stance contains a tuple whose values at non-interpreted 
positions make the antecedent of the CDC true. Denote 
by A' k and A ' k+1 the sets of CDCs in A' that apply on J k 
and J k+ 1 , respectively. For a CDC 0, let pos(h) be the set 
of (interpreted) positions corresponding to the variables 
mentioned in the consequent of 0 . 

a) Let ip e A' +1 nA' fc . Clearly, there exist suitable values 
for f at interpreted positions so that { Rif')} \= f (just 
take the corresponding values from t). 

b) Let 0 € A' fc+1 \ A' fe . The values at interpreted positions 
in t and t! differ only at position j, thus the antecedent 
of 0 mentions the variable x :] . Then, since position j is 
affected by the r.h.s. of the X-UIND R[i] C R\j] under 
consideration, by Definition 11 the consequent of 0 is 
satisfiable and the variables occurring therein are not 
mentioned in any other CDC in A'. Thus, there exist 
suitable values for t, at interpreted positions such that 
{f?(t , )| |= 0 , where the values at interpreted positions 
not in pos(b) can be chosen freely. 

c) Let 0,0' e A' k+ 1 \A' k and 0 G A' fe+1 n A}, then pos(</>), 
pos(<//) and pos( 0 ) are pairwise disjoint. 

From all of the above, we conclude that there are suitable 
values for t at interpreted positions so that the instance 
{i?(f , )| satisfies AJ. +1 and, in turn. A'. □ 

Theorem 9 is a direct consequence of Theorem 11, in 
the same way as Theorem 8 follows from Theorem 10. 

5.3 FDs and UINDs Together 

Unfortunately, the separability results presented above 
for combinations of CDCs and UINDs do not automatic¬ 
ally carry over to the case in which FDs are also present. 
In fact, although FDs do not directly interact with CDCs, 
they do in general interact with UINDs, 18 which in turn 
interact with CDCs. 

We write FDs over R as implications between sets of 
positions of R (e.g., {1,3} —> {4}). We call X-FD (resp., 
Y-FD) an FD whose l.h.s. and r.h.s. both consist of non- 
interpreted (resp., interpreted) positions; and we call XY- 
FD (resp., YX-FD ) an FD where the l.h.s. consists of non- 
interpreted (resp., interpreted) positions and the r.h.s. of 
interpreted (resp., non-interpreted) ones. 

The following generalises Theorem 7 in the presence 
of X-FDs and YX-FDs. 

18. The interaction between FDs and UINDs can be fully captured, as 
there is a sound and complete axiomatization for the finite implication 
of FDs and UINDs [11], 


Theorem 12. Let Abe a set of CDCs, Y-UINDs, X-FDs and 
YX-FDs, where the CDCs and Y-UINDs are dp-controllable. 
Then, the X-FDs, YX-FDs and Y-UINDs are {(dp )}-separable 
in A from the CDCs. 

Proof. The proof given for Theorem 10 can be modified 
as follows: in the "only if" direction take J 1 = I, which 
contains only the tuple t, and construct J" as in Lemma 7 
(with A* = A) by extending J' with tuples that have the 
same values as t at non-interpreted positions. Therefore, 
J" satisfies any FD whose r.h.s. is a set of non-interpreted 
positions. □ 

Theorem 8 does not hold anymore in the presence of Y- 
FDs, that is, X-UINDs and Y-FDs are not 0-separable in 
general from globally consistent CDCs, as shown below. 

Theorem 13. There is a set of X-UINDs, Y-FDs and globally 
consistent CDCs, in which the X-UINDs and Y-FDs are not 
0 -separable from the CDCs. 

Proof. Let R be a relation symbol of arity 4, whose last 
two positions are interpreted over the integers. Let A 
consist of the X-UIND f?[l] C R{2\, the Y-FD R: {3} —> 
{4}, and the following CDCs: 

x\ = a A X 2 = b yi = 0 A y 2 > 1 ; 

x 2 = a —F t/i = 0 A 2/2 < 1 • 

The above CDCs are globally consistent, since their con¬ 
sequents are satisfiable and their antecedents are never 
true at the same time (as X 2 cannot be simultaneously 
equal to b and a). Let E be the horizontal decomposition 
specified by U| : .x' i / a and V 2 : a; 2 / b. Clearly, E is lossy 
under cdc(A) as the instance I = {R(a,b, 0,2)} satisfies 
cdc(A) and E; indeed, the tuple (a, b, 0, 2) is in R 1 but it 
is not selected by either V\ or V 2 . Suppose that E is lossy 
under A. Then, there exists a model J of A U E and a 
tuple t G R J such that J f V\ J IJ V-> J ■ By definition of Yj 
and V 2 , we have that t[ 1 ] = a and t[ 2] = b and, in turn, 
f[3] = 0 and f[4] > 1 by the first CDC. By the X-UIND, 
there must be t 6 R J such that t [2] = a and, in turn, 
t! [3] = 0 and t'[ 4] < 1 by the second CDC. But then, I and 
t violate the Y-FD, since they agree on the third position 
but must differ on the fourth. Flence, ./ f A, which is a 
contradiction. Therefore, E is lossless under A, and we 
conclude that the X-UIND and Y-FD are not 0-separable 
in A from the CDCs. □ 

The CDCs in the above proof are globally consistent, 
but not disjoint w.r.t. the X-UIND. Flowever, Theorem 9 
does not hold either in the presence of Y-FDs, that is, not 
even disjointness is enough to ensure the 0-separability 
of X-UINDs and Y-FDs from CDCs. 

Theorem 14. There exists a set of CDCs, X-UINDs and Y- 
FDs, in ivhich the CDCs are disjoint zv.r.t. each X-UIND, but 
the X-UINDs and Y-FDs are not 0 -separable from the CDCs. 

Proof. Let R be a relation symbol of arity 4 with its last 
two positions interpreted over the integers. Let A consist 
of the X-UIND R[ 1] C R[2], the Y-FD R : {3} -f {4}, and 
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Table 1 

Summary of <S-separability results (unr = unrestricted, 
dpc = dp-controllable, go = globally consistent, dis = disjoint). 


Constraints 

CDCs 

S 

Theorem 

FDs 

unr 

0 

5 

Y-UINDs 

(+ X-FDs + YX-FDs) 

dpc 

{(dp)} 

7 

12 

X-UINDs 

gc 

dis 

0 

8 

9 

X-UINDs + Y-UINDs 

dpc + gc 
dpc + dis 

{(dp)} 

10 

11 


the CDC X 2 = a —> yi = 0 A yi = 2, trivially disjoint with 
the X-UIND. Consider the horizontal decomposition E 
specified by V : X\ ^ at\Xi ^bAyi yf OAj /2 ^ 1- Clearly, 
E is lossy under cdc(A) because the instance I = { R(t)}, 
where t = (a, b , 0,1), satisfies cdc(A) and E; indeed, t is 
not selected by V. Suppose that E is lossy under A; since 
V selects any tuple other than t, there is a model J of 
AuE such that t £ R J but t g V J . By the X-UIND, there 
must be t £ R J such that t! [2] = a and, in turn, f! [3] = 0 
and t! [4] = 2 by the CDC. But then, t and // violate the 
Y-FD, because they agree on the third position but differ 
on the fourth. So J |^= A, which is a contradiction. Hence, 
E is lossless under A, and we conclude that the X-UIND 
and Y-FD are not 0-separable in A from the CDCs. □ 

6 Discussion and Outlook 

In this article, we studied lossless horizontal decompos¬ 
ition under constraints in a setting where the values for 
some of the attributes in the schema are taken from an 
interpreted domain. Data values in such a domain can be 
compared in ways beyond equality, according to a first- 
order language £. We did not make any assumption on 
£, other than requiring it to be closed under negation. 

In the above setting, we considered a class of integrity 
constraints, CDCs, based on those introduced in [17]. We 
have characterised the consistency of a set of CDCs in 
terms of satisfiability in £ and we have shown that the 
problem of deciding consistency is N P-complete when £ 
is the language of either UTVPIs or BUTVPIs. 

We considered a more general form of selections than 
in [17] and characterised, in terms of unsatisfiability in 
£, whether a horizontal decomposition specified by such 
selections is lossless under CDCs. We have shown that 
the problem of deciding losslessness is co-N P-complete 
when £ is the language of either UTVPIs or BUTVPIs. 

We also considered losslessness under CDCs in com¬ 
bination with FDs and UINDs. We introduced and stud¬ 
ied the important notion of separability, which indicates 
whether constraints other than CDCs can be disregarded 
w.r.t. losslessness, after incorporating the effect of their 
interaction in terms of entailed CDCs. A summary of all 
the separability results presented in this article is given 
in Table 1. 


A promising direction for future research we are cur¬ 
rently investigating is the generalisation of the separabil¬ 
ity results for UINDs to arbitrary inclusion dependencies 
(INDs). Observe that INDs, differently from UINDs, can 
affect both interpreted and non-interpreted attributes at 
the same time, e.g., in R[x-\. y\ ] C R[x 2,3/2]- Some care is 
needed in allowing FDs in this setting as well, because 
logical implication for unrestricted combinations of FDs 
and INDs is undecidable and has no axiomatization [11]. 

Another interesting direction is that of allowing equal¬ 
ities between two variables in the antecedents of CDCs 
as well as in the selection conditions on non-interpreted 
attributes of view definitions. We believe our approach 
could be extended in this direction by representing such 
equalities by propositional variables and by adding suit¬ 
able axioms to the auxiliary theory to handle transitivity 
and symmetry. 

The main motivation for our study of lossless horizon¬ 
tal decomposition is that it provides the groundwork for 
the consistent and unambiguous propagation of updates 
in the context of selection views. By applying the general 
criterion of [6], given a lossless horizontal decomposition 
it is possible to determine whether an update issued on 
some (possibly all) of the fragments can be propagated 
to the underlying database without affecting the other 
fragments. Similarly, it is possible to partition the source 
relation by adding suitable conditions in the selections 
that define the fragments, so that each is disjoint with 
the others. In general, a lossy horizontal decomposition 
can always be turned into a lossless one by defining an 
additional fragment, called a complement, which selects 
the missing tuples. In particular, there is a unique minimal 
complement selecting all and only the rows of the source 
relation that are not selected by any of the other frag¬ 
ments. In follow-up work, we will show how to compute 
the definition of such a complement, in the scope of an 
in-depth study of partitioning and update propagation 
in the setting studied in this article. 

Most of the work in the field of horizontal decompos¬ 
ition has been carried out in the context of distributed 
databases systems, where one is mainly concerned with 
finding an optimal decomposition w.r.t. some parameters 
(e.g., workload, query-execution time, storage quotas), 
rather than determining whether a given horizontal de¬ 
composition is lossless. 

De Bra ([13], [14]) developed a theory of horizontal de¬ 
composition to partition a relation into two sub-relations 
such that one satisfies certain FDs that the other does not. 
The approach is based on constraints that capture partial 
implications between sets of FDs and exceptions to sets 
of FDs, for which a sound and complete set of inference 
rules is provided. These constraints are 0-separable from 
our CDCs (for the same reason FDs are). 

Maier and Ullman [16] consider horizontal decompos¬ 
ition involving physical and virtual fragments over the 
same attributes. Fragments are defined in an arbitrary 
(first-order) language closed under Booleans, where en- 
tailment is decidable and consisting of formulae that, as 
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in our case, can be evaluated by examining one tuple at 
a time, in isolation from the others. Differently from our 
case, the language allows to express equalities between 
variables associated with non-interpreted attributes. But, 
if such equalities are forbidden, the setting of [16] can be 
recast into ours: the union of the physical fragments is 
the single source relation R 1 we consider here, the defin¬ 
itions of the physical fragments can be taken as integrity 
constraints over R, and the definition of each virtual 
fragment (given in terms of the physical fragments and 
other virtual ones) can be expressed only in terms of R 
by query unfolding. Then, the problem of determining 
whether the virtual fragments constitute a lossless hor¬ 
izontal decomposition of the physical fragments, which 
is not addressed in [16], can be solved by applying the 
techniques we described in this article. Virtual fragments 
in [16] are defined by selection and union, that is, in our 
notation, by formulae of either the form X(x) A a(y) or 
\(x) V u(y). As we remarked in Section 2, in such a case 
losslessness can be checked by considering two views 
A (A) and a(y) in place of each view of the latter form. 
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